Skip to content

Conversation

@visz11
Copy link
Collaborator

@visz11 visz11 commented Oct 14, 2025

User description

Delete this text and replace it with a detailed description of your change. The
PR title and body will become the squashed commit message.

If you would like to tag individuals, add some commentary, upload images, or
include other supplemental information that should not be part of the eventual
commit message, please use a separate comment.

If applicable, please include a summary of the testing strategy (including
rationale) for the proposed change. Unit and/or integration tests are expected
for any behavior change and system tests should be considered for larger
changes.


CodeAnt-AI Description

Add deterministic topic and group hashing to group coordinator

What Changed

  • Added a stable topic-hash function that produces a topic-level hash from a topic's id, name, partition count, and each partition's rack list (racks are sorted per partition). The hash changes when any of those fields change.
  • Added a group-hash function that orders topic names and combines their topic hashes into a single group-level hash. The group hash changes if topic hashes or topic ordering change.
  • Added unit tests that verify the topic hash is sensitive to changes in topic id, name, partition count, partition rack contents and order, and that group hash depends on the ordered list of topic hashes.
  • Updated import control to allow the hashing library used by the new code.

Impact

✅ Detect topic metadata changes
✅ Deterministic group hashes for metadata comparison
✅ Verified sensitivity to id/name/partition/rack and ordering changes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • New Features

    • Added deterministic hashing for topics and groups in the coordinator.
  • Tests

    • Introduced unit tests validating hashing stability, ordering, and sensitivity to metadata changes.
  • Chores

    • Added a hashing library to build configuration.
    • Updated import rules to allow required hashing packages.

@codeant-ai
Copy link

codeant-ai bot commented Oct 14, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@github-actions github-actions bot added the build label Oct 14, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new hashing utilities within the group-coordinator module to deterministically compute hashes for Kafka topics and consumer groups. This functionality is crucial for maintaining consistent state representation and enabling efficient comparisons or lookups based on these hashes. The changes include adding the Guava library as a dependency and implementing the hashing logic, along with thorough unit tests to validate its correctness and determinism.

Highlights

  • New Dependency: The Google Guava library, specifically its hashing utilities, has been added as a new dependency to the group-coordinator module.
  • Topic Hashing Function: A new static method, computeTopicHash, has been introduced in the Group class. This method generates a deterministic Murmur3-128 hash for a Kafka topic, incorporating its ID, name, number of partitions, and the sorted rack assignments of its replicas.
  • Group Hashing Function: A new static method, computeGroupHash, has been added to the Group class. This function computes a combined hash for a consumer group by deterministically ordering and combining the hashes of its associated topics.
  • Comprehensive Unit Tests: A new test file, GroupTest.java, has been added with extensive unit tests to validate the correctness and determinism of both computeTopicHash and computeGroupHash under various scenarios, including changes in magic bytes, partition order, rack order, and different topic fields.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Oct 14, 2025

Walkthrough

Added Guava as a dependency and import allowance. Introduced deterministic hashing utilities in Group for topics and groups using Murmur3, leveraging cluster and topic metadata. Added comprehensive unit tests validating hashing stability and sensitivity. Updated gradle dependency catalogs and versions, and checkstyle import controls.

Changes

Cohort / File(s) Summary of Changes
Dependency and build configuration
build.gradle, gradle/dependencies.gradle
Added Guava (com.google.guava:guava:33.4.0-jre) to versions and libs; included as an implementation dependency in the group-coordinator module.
Import control configuration
checkstyle/import-control-group-coordinator.xml
Allowed imports for com.google.common.hash in coordinator and metrics subtrees.
Group hashing utilities
group-coordinator/src/main/java/.../Group.java
Added static methods: computeTopicHash(TopicImage, ClusterImage) and computeGroupHash(Map<String, Long>); added related imports.
Unit tests for hashing
group-coordinator/src/test/java/.../GroupTest.java
New JUnit 5 tests covering deterministic hashing, ordering effects, rack composition, and image variations for topic and group hashes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitch my ears at hashes new,
Murmur3 winds the racks right through.
Sorted names, partitions too—
Deterministic trails I chew.
With Guava leaves, I crunch and play,
Group by group, I bound away.
Thump! The tests all pass today. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description still contains the unmodified placeholder template text and does not include a user‐authored summary of the changes, rationale, or testing strategy as required by the repository’s template. Please replace the placeholder text with a detailed description of your change, including a summary of the implemented features, testing strategy, and any relevant context in accordance with the repository’s description template.
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title “KAFKA-17747: Add compute topic and group hash” succinctly identifies the main change of introducing topic‐level and group‐level hashing methods and aligns closely with the implemented code additions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clone-KAFKA-17747-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Oct 14, 2025
@refacto-visz
Copy link

refacto-visz bot commented Oct 14, 2025

KAFKA-17747: Add compute topic and group hash

TL;DR: Adds hash computation utilities for topics and groups using Guava's Murmur3 hashing with comprehensive test coverage.


Refacto PR Summary

Implements deterministic hash computation methods for Kafka topics and consumer groups using Google Guava's Murmur3 algorithm.
Adds static methods to compute topic hashes based on metadata (ID, name, partitions, rack topology) and group hashes from topic hash collections. This PR introduces hash computation capabilities for Kafka's group coordinator module, enabling consistent fingerprinting of topic metadata and group compositions. The implementation uses Murmur3 128-bit hashing for topic fingerprints incorporating topic ID, name, partition count, and rack distribution. Group hashes combine ordered topic hashes using Guava's combineOrdered function. The changes include comprehensive unit tests validating hash consistency, field sensitivity, and ordering requirements.

Change Highlights

Click to expand
  • build.gradle: Adds Google Guava dependency to group-coordinator project
  • gradle/dependencies.gradle: Defines Guava 33.4.0-jre version and library reference
  • checkstyle/import-control-group-coordinator.xml: Allows com.google.common.hash package imports
  • Group.java: Implements computeTopicHash() and computeGroupHash() static methods using Murmur3
  • GroupTest.java: Comprehensive test suite validating hash computation correctness and consistency

Sequence Diagram

sequenceDiagram
    participant C as Coordinator
    participant G as Group
    participant T as TopicImage
    participant CI as ClusterImage
    participant H as Hasher
    
    C->>G: computeTopicHash(topicImage, clusterImage)
    G->>H: newHasher()
    G->>H: putByte(0) magic
    G->>T: getId(), getName(), partitions()
    T-->>G: topic metadata
    G->>H: putLong(id), putString(name), putInt(partitions)
    G->>CI: getBrokerRacks()
    CI-->>G: rack information
    G->>H: putInt(partitionId), putString(racks)
    H-->>G: hash.asLong()
    C->>G: computeGroupHash(topicHashes)
    G->>H: combineOrdered(sortedHashes)
    H-->>C: combined hash
Loading

Testing Guide

Click to expand
  1. Hash consistency: Run GroupTest.testComputeTopicHash() to verify identical inputs produce identical hashes
  2. Field sensitivity: Execute parameterized tests to confirm different topic IDs, names, partitions, or racks generate different hashes
  3. Order independence: Test computeGroupHash() with different topic insertion orders, verify consistent output due to internal sorting
  4. Magic byte validation: Verify hash changes when magic byte differs, ensuring version compatibility
  5. Rack ordering: Confirm sorted rack strings produce deterministic hashes regardless of broker registration order

@visz11
Copy link
Collaborator Author

visz11 commented Oct 14, 2025

/refacto-visz

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces hashing functions for topics and groups using the Guava library, which is a valuable addition for tracking metadata changes. The implementation is sound, and the accompanying tests are thorough. I have a few suggestions to enhance the robustness and maintainability of the new code. My recommendations include using a more appropriate hashing method for UUIDs, simplifying a stream operation, and refactoring some duplicated code in the tests.

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using topicImage.id().hashCode() for hashing the Uuid is not ideal as it truncates the 128-bit UUID to a 32-bit integer hash, increasing the potential for collisions. Guava's Hasher provides a putUuid(Uuid) method that correctly uses the full 128 bits of the UUID. Using this method will make the hash more robust.

Please note that this change will require updating the corresponding logic in GroupTest.java.

Suggested change
.putLong(topicImage.id().hashCode()) // topic Id
.putUuid(topicImage.id()) // topic Id

Comment on lines +260 to +261
.filter(Optional::isPresent)
.map(Optional::get)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pattern .filter(Optional::isPresent).map(Optional::get) can be simplified to .flatMap(Optional::stream) since your project's Java version supports it. This change will make the stream pipeline more concise and idiomatic.

Suggested change
.filter(Optional::isPresent)
.map(Optional::get)
.flatMap(Optional::stream)

Comment on lines +50 to +65
@Test
void testComputeTopicHash() {
long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertEquals(topicHasher.hash().asLong(), result);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication across the testComputeTopicHash* methods. The logic for building the hash is repeated in testComputeTopicHash, testComputeTopicHashWithDifferentMagicByte, testComputeTopicHashWithDifferentPartitionOrder, and testComputeTopicHashWithDifferentRackOrder. This makes the tests brittle and harder to maintain, as any change to the hashing logic in Group.computeTopicHash would require updates in multiple test methods.

Consider refactoring this logic into a private helper method within the test class. This would centralize the hash construction, making the tests cleaner and more maintainable.

@refacto-visz
Copy link

refacto-visz bot commented Oct 14, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

<allow pkg="org.apache.kafka.coordinator.common.runtime" />
<allow pkg="com.google.re2j" />
<allow pkg="org.apache.kafka.metadata" />
<allow pkg="com.google.common.hash" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Narrow the permission surface by marking the package as exact-match to avoid accidentally allowing subpackages under com.google.common.hash. [security]

Suggested change
<allow pkg="com.google.common.hash" />
<allow pkg="com.google.common.hash" exact-match="true" />
Why Change? ⭐

The proposed change only adds the attribute exact-match="true" which is already used elsewhere in this import-control file (e.g. other <allow ... exact-match="true" /> lines), so the attribute is supported by the DTD referenced at the top of the file and will not introduce a syntax error.

This modification narrows the permission surface by allowing only the exact package com.google.common.hash and not its subpackages. It is syntactically valid XML and consistent with the surrounding entries. Assumptions:

  • The import-control DTD used by this project supports the exact-match attribute (evidence: other lines in this file use it).
  • No runtime code requires implicit access to subpackages under com.google.common.hash; if such access is required, this change would be a deliberate tightening of permissions rather than a syntactic bug.

Given these points, the change is safe from a syntax/execution perspective and does not introduce errors into the configuration itself.

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Use a deterministic string representation of the topic id (e.g. id().toString()) instead of hashCode() when feeding the hasher to avoid collisions and ensure consistent hashing across JVMs. [possible bug]

Suggested change
.putLong(topicImage.id().hashCode()) // topic Id
.putString(topicImage.id().toString(), StandardCharsets.UTF_8) // topic Id as string
Why Change? ⭐

The improved code is syntactically correct and uses only symbols present in the file:

  • Hasher.putString(CharSequence, Charset) is a valid Guava Hasher API and StandardCharsets is already imported.
  • topicImage.id().toString() is a safe replacement for topicImage.id().hashCode(): it produces a deterministic, textual representation of the id (which is typically a UUID or equivalent), avoiding reliance on JVM-specific hashCode implementations and reducing collision risk.
  • The change does not introduce additional null-safety concerns beyond the original (both .hashCode() and .toString() would NPE if id() is null).
    Assumptions and rationale:
  • topicImage.id() has a meaningful, stable toString() (which is true for UUIDs and Kafka Uuid types).
  • The change intentionally alters the bytes fed into the hasher to a deterministic textual form; this is a compatible internal change to the hashing input and compiles without additional imports.
    Given these verifiable points and that the edit is local and uses existing imports/APIs, this suggestion is classified as verified.

@refacto-visz
Copy link

refacto-visz bot commented Oct 14, 2025

Code Review: Group Coordinator Hash Implementation

👍 Well Done
Comprehensive Test Coverage

Thorough unit tests validate hash sensitivity to all metadata changes

Deterministic Hash Implementation

Uses Murmur3 hashing with proper ordering for consistent results across environments

📁 Selected files for review (5)
  • build.gradle
  • checkstyle/import-control-group-coordinator.xml
  • gradle/dependencies.gradle
  • group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java
  • group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
🎯 Custom Instructions
✅ Applied Instructions
Organization Guidelines
  • Avoid commented code in actual codebases
  • Hardcoding of variables

Scope: All files

📝 Additional Comments
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (7)
Hash Collision Risk

Using hashCode() on UUID introduces unnecessary hash collision risk in topic hash calculation. UUID already provides good distribution, but hashCode() reduces 128-bit UUID to 32-bit integer. Consider using UUID bytes directly or most/least significant bits for better hash quality.

Standards:

  • Algorithm-Correctness-Hash-Quality
  • Mathematical-Accuracy-Distribution
Null Safety Enhancement

Null parameter validation missing for topicImage and clusterImage inputs. NullPointerException during hash computation would cause coordinator service failures. Defensive null checks would prevent service disruption from invalid metadata states.

Standards:

  • ISO-IEC-25010-Reliability-Fault-Tolerance
  • DbC-Preconditions
  • ISO-IEC-25010-Functional-Correctness-Appropriateness
Stream Allocation Optimization

Stream pipeline creates intermediate sorted collection and maps to HashCode objects before final collection. Each map operation allocates new HashCode instances causing memory pressure during group hash computation. Consider pre-sizing collections or using primitive-optimized operations for better memory efficiency.

Standards:

  • ISO-IEC-25010-Performance-Efficiency-Resource-Utilization
  • Memory-Allocation-Optimization
  • Stream-Processing-Efficiency
Hash Collision Risk

Group hash computation combines multiple topic hashes without collision detection or validation. While Murmur3 has good distribution properties, combining many hashes increases collision probability. Consider implementing collision detection or using a more robust combining strategy for production systems handling large numbers of topics.

Standards:

  • CWE-328
  • NIST-SSDF-PW.1
Empty Rack Handling

When all brokers have no rack information, the racks string becomes empty. This could lead to identical hash contributions from partitions with different replica sets but no rack data. Consider adding partition replica IDs as fallback to maintain hash differentiation when rack information is unavailable.

Standards:

  • Algorithm-Correctness-Edge-Cases
  • Business-Rule-Hash-Uniqueness
Extract Hash Constants

Hash algorithm and magic byte are hardcoded within method implementation. Extracting these as named constants improves readability and enables easier algorithm updates. Consider static final fields for HASH_ALGORITHM and TOPIC_HASH_VERSION.

Standards:

  • Clean-Code-Constants
  • SOLID-OCP
  • Maintainability-Quality-Flexibility
Missing Input Validation

Hash computation methods accept parameters without null validation or input sanitization. Malformed or null inputs could cause unexpected behavior or exceptions during hash calculation. Adding input validation would improve robustness and prevent potential denial of service through malformed metadata.

Standards:

  • CWE-20
  • OWASP-A03
  • NIST-SSDF-PW.1

.map(Optional::get)
.sorted()
.collect(Collectors.joining(";"));
topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Import Statement

Code uses Collectors.joining() but missing import for java.util.stream.Collectors. This will cause compilation failure when the new computeTopicHash method is invoked. The stream operation cannot resolve Collectors without proper import declaration.

import java.util.stream.Collectors;
Commitable Suggestion
Suggested change
topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
import java.util.stream.Collectors;
Standards
  • Algorithm-Correctness-Dependency-Resolution
  • Logic-Verification-Compilation-Safety

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Magic Byte

Magic byte value hardcoded as constant zero violates organization guideline against hardcoding variables. This reduces hash algorithm flexibility and prevents versioning of hash computation logic.

Standards
  • Org-Guideline-Hardcoding of variables
  • CWE-547
  • OWASP-A05

Comment on lines +255 to +265
topicHasher.putInt(entry.getKey()); // partition id
String racks = Arrays.stream(entry.getValue().replicas)
.mapToObj(clusterImage::broker)
.filter(Objects::nonNull)
.map(BrokerRegistration::rack)
.filter(Optional::isPresent)
.map(Optional::get)
.sorted()
.collect(Collectors.joining(";"));
topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream Processing Inefficiency

Nested stream processing creates multiple intermediate collections for each partition. Inner stream operations execute for every partition iteration causing O(n*m) complexity where n=partitions, m=replicas. Performance degrades with partition count and replica factor affecting hash computation scalability.

Standards
  • ISO-IEC-25010-Performance-Efficiency-Time-Behavior
  • Optimization-Pattern-Stream-Efficiency
  • Algorithmic-Complexity-Nested-Operations

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic Byte Documentation

Magic byte value lacks documentation explaining its purpose and versioning strategy. Future hash algorithm changes may require different magic bytes for backward compatibility. Adding constant with descriptive name improves maintainability.

Standards
  • Clean-Code-Comments
  • Maintainability-Quality-Documentation

@codeant-ai
Copy link

codeant-ai bot commented Oct 14, 2025

Pull Request Feedback 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • New third-party dependency
    The import-control now allows com.google.common.hash. Verify that the coordinator module declares an explicit, pinned dependency (e.g. Guava or the specific artifact providing that package) in the module build files and that the chosen version has been audited for compatibility and CVEs. Also confirm the dependency is available at runtime for the coordinator code that will use it.

  • Build & CI impact
    Ensure CI/build scripts are updated so that the new import-control entry doesn't cause unexpected failures (missing dependency, classpath differences across test vs prod). Confirm unit/integration tests are run with the same dependency set used in production packaging.

  • Map iteration order assumption
    The group-hash tests construct a Map using Map.of(...) and assume a specific ordering when building the expected Hashing.combineOrdered(...) input list. Iteration order for some map implementations is not guaranteed; ensure Group.computeGroupHash's contract about ordering is explicit, and make the test construct an ordered collection (e.g. LinkedHashMap or ordered list) if ordering is important.

  • Topic-id hashing
    The tests (and the manual expected-hash construction) use Uuid.hashCode() and feed that into Hasher.putLong(...). Relying on hashCode() may be brittle or lossy compared to using the full UUID value; it may not uniquely represent the UUID bits and could lead to accidental collisions or mismatch with production code that encodes UUIDs differently.

  • Hard-coded rack strings
    The expected hash is built using hard-coded rack serialization strings like "rack0;rack1". This ties the test to a specific representation and ordering of racks produced by MetadataImageBuilder.addRacks() and can make the test fragile if the representation/ordering changes. Prefer deriving the expected rack string directly from the metadata object under test.

  • Import-control scope
    Review whether allowing com.google.common.hash is narrowly scoped enough. Allowing a package can broaden the permitted API surface; ensure this entry is limited to the minimal package(s) required by the new hashing implementation and does not unintentionally permit unrelated Guava subpackages.

@codeant-ai
Copy link

codeant-ai bot commented Oct 14, 2025

CodeAnt AI finished reviewing your PR.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1)

119-160: Optional: add an invariance test for replica order.

Consider a test that swaps replica order for a partition but preserves racks; the hash should stay the same since racks are sorted.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed79212 and 36cb999.

📒 Files selected for processing (5)
  • build.gradle (1 hunks)
  • checkstyle/import-control-group-coordinator.xml (1 hunks)
  • gradle/dependencies.gradle (2 hunks)
  • group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (2 hunks)
  • group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (3)
metadata/src/main/java/org/apache/kafka/image/ClusterImage.java (1)
  • ClusterImage (36-108)
metadata/src/main/java/org/apache/kafka/image/TopicImage.java (1)
  • TopicImage (37-92)
metadata/src/main/java/org/apache/kafka/metadata/BrokerRegistration.java (1)
  • BrokerRegistration (44-398)
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)
clients/src/main/java/org/apache/kafka/common/Uuid.java (1)
  • Uuid (32-196)
metadata/src/main/java/org/apache/kafka/image/MetadataImage.java (1)
  • MetadataImage (33-200)
🔇 Additional comments (7)
build.gradle (1)

1423-1424: Guava dep addition looks correct; ensure LICENSE update.

Dependency wiring is fine for group-coordinator. Please update LICENSE-binary to account for Guava and any transitive artifacts.

gradle/dependencies.gradle (2)

64-65: Version pin for Guava is fine; please confirm license artifacts.

Adding versions.guava is consistent. Ensure binary LICENSE files are updated accordingly.


151-152: libs mapping OK.

libs.guava definition is correct and matches the version entry.

checkstyle/import-control-group-coordinator.xml (1)

80-81: Import allowance scoped appropriately.

Allowing only com.google.common.hash is precise and aligned with usage.

group-coordinator/src/main/java/org/apache/kafka/coordinator/group/Group.java (1)

229-237: Deterministic group hash over sorted topics: LGTM.

Sorting by topic name and combining ordered 64-bit hashes is clear and stable.

group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java (2)

162-188: Group hash tests: LGTM.

They capture ordering sensitivity as intended.


50-65: Ignore suggested test update: Group.computeTopicHash still uses topicImage.id().hashCode(), so the expected hash in the test should remain based on .hashCode().

Likely an incorrect or invalid review comment.

Comment on lines +246 to +267
static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id
.putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
.putInt(topicImage.partitions().size()); // number of partitions

topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> {
topicHasher.putInt(entry.getKey()); // partition id
String racks = Arrays.stream(entry.getValue().replicas)
.mapToObj(clusterImage::broker)
.filter(Objects::nonNull)
.map(BrokerRegistration::rack)
.filter(Optional::isPresent)
.map(Optional::get)
.sorted()
.collect(Collectors.joining(";"));
topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
});
return topicHasher.hash().asLong();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use full 128-bit UUID; avoid ambiguous rack concatenation.

  • Bug: putLong(topicImage.id().hashCode()) collapses 128-bit UUID to 32-bit int, increasing collision risk. Use most/least bits.
  • Improvement: Joining racks with ";" can collide if rack strings contain ";" (e.g., "a;" + "b" vs "a" + ";b"). Encode lengths instead.

Apply this essential fix for UUID handling:

-        Hasher topicHasher = hf.newHasher()
-            .putByte((byte) 0) // magic byte
-            .putLong(topicImage.id().hashCode()) // topic Id
-            .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
-            .putInt(topicImage.partitions().size()); // number of partitions
+        Hasher topicHasher = hf.newHasher()
+            .putByte((byte) 0) // magic byte
+            // topic id: use full 128 bits to avoid collisions
+            .putLong(topicImage.id().getMostSignificantBits())
+            .putLong(topicImage.id().getLeastSignificantBits())
+            .putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
+            .putInt(topicImage.partitions().size()); // number of partitions

Optionally harden rack encoding to avoid separator ambiguity:

-            String racks = Arrays.stream(entry.getValue().replicas)
-                .mapToObj(clusterImage::broker)
-                .filter(Objects::nonNull)
-                .map(BrokerRegistration::rack)
-                .filter(Optional::isPresent)
-                .map(Optional::get)
-                .sorted()
-                .collect(Collectors.joining(";"));
-            topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
+            List<String> racks = Arrays.stream(entry.getValue().replicas)
+                .mapToObj(clusterImage::broker)
+                .filter(Objects::nonNull)
+                .map(BrokerRegistration::rack)
+                .flatMap(Optional::stream)
+                .sorted()
+                .toList();
+            topicHasher.putInt(racks.size());
+            for (String rack : racks) {
+                topicHasher.putInt(rack.length());
+                topicHasher.putString(rack, StandardCharsets.UTF_8);
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(topicImage.id().hashCode()) // topic Id
.putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
.putInt(topicImage.partitions().size()); // number of partitions
topicImage.partitions().entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(entry -> {
topicHasher.putInt(entry.getKey()); // partition id
String racks = Arrays.stream(entry.getValue().replicas)
.mapToObj(clusterImage::broker)
.filter(Objects::nonNull)
.map(BrokerRegistration::rack)
.filter(Optional::isPresent)
.map(Optional::get)
.sorted()
.collect(Collectors.joining(";"));
topicHasher.putString(racks, StandardCharsets.UTF_8); // sorted racks with separator ";"
});
return topicHasher.hash().asLong();
}
static long computeTopicHash(TopicImage topicImage, ClusterImage clusterImage) {
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
// topic id: use full 128 bits to avoid collisions
.putLong(topicImage.id().getMostSignificantBits())
.putLong(topicImage.id().getLeastSignificantBits())
.putString(topicImage.name(), StandardCharsets.UTF_8) // topic name
.putInt(topicImage.partitions().size()); // number of partitions
topicImage.partitions().entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.forEach(entry -> {
topicHasher.putInt(entry.getKey()); // partition id
// hardened rack encoding to avoid separator ambiguity
List<String> racks = Arrays.stream(entry.getValue().replicas)
.mapToObj(clusterImage::broker)
.filter(Objects::nonNull)
.map(BrokerRegistration::rack)
.flatMap(Optional::stream)
.sorted()
.toList();
topicHasher.putInt(racks.size());
for (String rack : racks) {
topicHasher.putInt(rack.length());
topicHasher.putString(rack, StandardCharsets.UTF_8);
}
});
return topicHasher.hash().asLong();
}

Comment on lines +67 to +82
@Test
void testComputeTopicHashWithDifferentMagicByte() {
long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());

HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 1) // different magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertNotEquals(topicHasher.hash().asLong(), result);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Apply same UUID change here.

-            .putByte((byte) 1) // different magic byte
-            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putByte((byte) 1) // different magic byte
+            .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+            .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Test
void testComputeTopicHashWithDifferentMagicByte() {
long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID), FOO_METADATA_IMAGE.cluster());
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 1) // different magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertNotEquals(topicHasher.hash().asLong(), result);
}
@Test
void testComputeTopicHashWithDifferentMagicByte() {
long result = Group.computeTopicHash(FOO_METADATA_IMAGE.topics().getTopic(FOO_TOPIC_ID),
FOO_METADATA_IMAGE.cluster());
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 1) // different magic byte
.putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
.putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8) // rack of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertNotEquals(topicHasher.hash().asLong(), result);
}
🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 67 to 82, the test builds a manual topic hasher using
FOO_TOPIC_ID.hashCode(); update it to use the topic UUID components like the
production code: replace the single putLong(FOO_TOPIC_ID.hashCode()) call with
two calls putLong(FOO_TOPIC_ID.getMostSignificantBits()) and
putLong(FOO_TOPIC_ID.getLeastSignificantBits()) so the test uses the UUID's
actual bits when computing the expected hash.

Comment on lines +88 to +100
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
// different partition order
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0
assertNotEquals(topicHasher.hash().asLong(), result);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Apply same UUID change here.

-            .putByte((byte) 0) // magic byte
-            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putByte((byte) 0) // magic byte
+            .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+            .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
// different partition order
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0
assertNotEquals(topicHasher.hash().asLong(), result);
}
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
.putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
// different partition order
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8) // rack of partition 1
.putInt(0) // partition 0
.putString("rack0;rack1", StandardCharsets.UTF_8); // rack of partition 0
assertNotEquals(topicHasher.hash().asLong(), result);
🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 88 to 100, the test still uses FOO_TOPIC_ID.hashCode() when
serializing the topic UUID; replace that single putLong(FOO_TOPIC_ID.hashCode())
with two explicit long writes for the UUID parts —
putLong(FOO_TOPIC_ID.getMostSignificantBits()) followed by
putLong(FOO_TOPIC_ID.getLeastSignificantBits()) — so the UUID is serialized
consistently with the other changes.

Comment on lines +106 to +117
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertNotEquals(topicHasher.hash().asLong(), result);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Apply same UUID change here.

-            .putByte((byte) 0) // magic byte
-            .putLong(FOO_TOPIC_ID.hashCode()) // topic Id
+            .putByte((byte) 0) // magic byte
+            .putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
+            .putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.hashCode()) // topic Id
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertNotEquals(topicHasher.hash().asLong(), result);
}
HashFunction hf = Hashing.murmur3_128();
Hasher topicHasher = hf.newHasher()
.putByte((byte) 0) // magic byte
.putLong(FOO_TOPIC_ID.getMostSignificantBits()) // topic Id (MSB)
.putLong(FOO_TOPIC_ID.getLeastSignificantBits()) // topic Id (LSB)
.putString(FOO_TOPIC_NAME, StandardCharsets.UTF_8) // topic name
.putInt(FOO_NUM_PARTITIONS) // number of partitions
.putInt(0) // partition 0
.putString("rack1;rack0", StandardCharsets.UTF_8) // different rack order of partition 0
.putInt(1) // partition 1
.putString("rack1;rack2", StandardCharsets.UTF_8); // rack of partition 1
assertNotEquals(topicHasher.hash().asLong(), result);
}
🤖 Prompt for AI Agents
In
group-coordinator/src/test/java/org/apache/kafka/coordinator/group/GroupTest.java
around lines 106 to 117, the test currently uses FOO_TOPIC_ID.hashCode() when
feeding the UUID into the hasher; change this to encode the full UUID like
elsewhere by replacing the single putLong(FOO_TOPIC_ID.hashCode()) with two
putLong calls that write FOO_TOPIC_ID.getMostSignificantBits() and
FOO_TOPIC_ID.getLeastSignificantBits() so the complete UUID is used in the
hashing.

@visz11 visz11 closed this Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants